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What is it? 


Using corpora to teach languages is nothing new and, while the term corpus 
linguistics hails from the 1940s, most language learning before the 20th century 
adopted a corpus approach — using a series of texts in the language under study 
as a type of corpus on which to base acquisition. With the advent of widespread 
computing in the latter half of the 20th century, corpora began to be digitised, 
rendering interrogation of large amounts of data a much simpler and more 
appealing prospect. Today, languages in all forms (written, spoken, performed, 
formal, informal, etc.) are captured all the time through online and digital 
platforms, apps, etc. meaning that the wealth of language data literally at our 
fingertips is enormous. This has triggered the development of appropriate tools 
to explore these vast data sets. 


For language teaching and learning the possibilities fall into two categories: using 
existing corpora or creating your own corpora. A good place to start exploring 
language corpora is Sketch Engine (https://www.sketchengine.eu/corpora-and- 
languages/). You can sign up for a free 30 day trial and access all functions, 
featured corpora for all languages, as well as the corpus building capacities. 
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Chapter 15. Digital corpora 


Which leads to the second type of activity: creating corpora. Apart from Sketch 
Engine, another relatively accessible option is #LancsBox (http://corpora.lanes. 
ac.uk/lancsbox/) which allows you to either interact with existing corpora or 
create your own. 


Why use corpora? Applying corpora in your teaching and learning can support 
activities which involve inductive learning: analysing language to work out 
how something works, particularly in context. Utilising digital corpora, either 
those already available or creating your own customised corpora, streamlines 
this process as you can instantaneously produce all instances of, say, a 
particular grammatical feature or see how a word is used. You can also apply 
this to text types or genres — for instance, what do newspaper articles do that 
is different to short stories or how do people make doctor’s appointments over 
the phone compared to making a hair appointment? Many online language 
sites take a corpus approach such as Reverso Context (https://context.reverso. 
net/translation/). 


Example 


A constant stumbling block for learners of Italian is the choice of preposition. 
This often comes from the simplistic one-to-one translations presented in 
language textbooks, manuals, etc. In order to sensitise students to the importance 
of context in the correct selection of prepositions, I devised an exercise which 
used a small corpus created from the two literary texts that were under study 
at the time — I thought this would be useful pedagogically, since the students 
were already reading these texts and therefore would approach the task with 
less anxiety and more familiarity. I imported the texts into #LancsBox to create 
the corpus and then created lists of concordances (Figure 1) which showed 
the prepositions, a, di, and da in context. | gave students a table (Figure 2) to 
complete which helped guide their mining of the data. Essentially, they had 
to transpose the occurrences of the preposition from the original concordance 
lists of contexts from the texts in question into columns which showed the 
diverse functions of the prepositions: e.g. locative, genitive, introducing an 
infinitive, etc. 
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Example of list of concordances of preposition di 


Verbi che vogliono di | Espressioni che Espressioni Uso Altri usi 
quando seguitida un | vogliono diquando | idiomatiche con | _possessivo | Da definire 
infinito seguite da un infinito di Per es.+il libro 
Per es.: ha finito di | Peres.: sono capace | Peres.: di dilei 
‘mangiare di farlo buon’ora 
Figure 2. Example of table for students to complete: la preposizione di 
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Benefits 


Existing language corpora provide endless examples of language in context in 
diverse registers, genres, time periods, and text dimensions. While predominantly 
text-based, there are also corpora of recorded language whether spontaneous, 
televised/broadcast, or scripted. Importantly, the work of constructing these 
language banks has already been done (and continues). 


For those with developed Information Technology (IT) literacy, corpora tools 
offer a lot of scope for exploration of language and data-driven learning. Teachers 
can custom-build their own corpora or customise existing corpora. Students too 
can be instructed to use corpora tools to investigate how language works through 
accessing large arrays of exemplar texts. 


Potential issues 


The most glaring issue with digital corpora is technology. Corpus linguistics is 
the province of computer scientists and linguists and, while software tools are 
becoming more user friendly, building and interrogating corpora still require a 
significant effort even for those with reasonable IT skills. 


In the example above, I decided to avoid wrestling with students’ capacity to 
use the software to access the corpus and provide them with an excerpt myself. 
This was largely because the year before this I had asked the previous group of 
students to download software, read the manual, load the corpus, and then carry 
out various tasks which remained beyond the majority of my students. The 
focus of my class was not corpus linguistics, this was simply a different way 
to approach the study of Italian so, in some respects, it is too much to expect 
language students to (want to) learn how to use digital corpora. Additional 
issues relate to accessibility of digital corpora which might be problematic for 
students with learning or physical disabilities, or limited access to technology. 
Finally, not all languages have the same number or variety of corpora readily 
available online. 
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Looking to the future 


There is no doubt that we will continue to amass massive amounts 
of (la ge) data. It is also the case that digital corpora will lead 
to more nuanced development of translation and Al-supported 


lai = tools. Taking advantage of these developments and 


accessing digital corpora to support language learning, both in and 


outside formal settings, offers great potential for our students to 


experience languages in all their g 
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